
10/550788 

JC12Rec'dPCT/PTC 2 8 SEP 2005 



DESCRIPTION 



Method of cDNA Synthesis 



5 



Technical Field 



The invention of this application relates to a method of cDNA synthesis. 
More particularly, the invention of this application relates to a novel, simple and 
10 high-efficient method for synthesizing cDNA possessing a consecutive sequence 
starting with a nucleotide adjacent to a cap structure of mRNA. 



(chromosome DNA) that covers all genetic information of various organisms including 
human, mouse, rice, nematoda, yeast and so on. The entire sequence of these genomes 
is expected to give us information on the primary structure of proteins encoded by genes 

20 and information on the expression regulation regions (promoter, enhancer, suppressor 
etc.) that regulate the expression of the gene. In order to extract these two kinds of 
information from the genome sequence, the sequence information of mRNA transcribed 
from the gene locus of chromosome DNA is crucial. In order to analyze the sequence 
of mRNA, DNA complementary to mRNA (complementary DNA: cDNA) has been 

25 usually used. Especially, in order to obtain the foregoing two kinds of information, it 
is necessary to obtain cDNA (full-length cDNA) synthesized from mRNA that is 
correctly transcribed from the gene transcription region and contains an entire 
protein-coding region. 

30 Usually, full-length cDNA must meet two requirements. One is to possess a 

sequence starting with a transcription start site on the genome DNA. A "cap structure" 
is added to the 5' end of mRNA that is properly transcribed from the transcription start 



Background Art 



15 



Genome projects have determined almost all sequences of genome DNA 
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site. This cap structure is 7-methylguanosin (m 7 G) connected to the 
transcription-start-site nucleotide via 5' -5' triphosphate linkage. The cDNA 
complementary to the mRNA possessing this cap structure meets one requirement for 
full-length cDNA. Another indicator is the presence of a "poly(A) tail" of mRNA. 

5 This poly(A) tail is a consecutive sequence of several ten to 200 adenines (A) that is 
added to the 3' end of mRNA in the nucleus after transcription of genome DNA. 
Therefore, cDNA correctly synthesized from a mRNA template possessing both the cap 
structure at the 5' end and the poly (A) tail at the 3' end meets the two requirements for 
full-length cDNA (starting with a transcription start site and encompassing an entire 

1 0 protein-coding region) . 

The cDNA can be synthesized by reverse transcriptase reaction using mRNA 
as a template, but it is difficult to synthesize full-length cDNA, because mRNA 
transcribed from chromosome DNA is exposed to various degradation reactions in cells 

15 or during an extraction process from cells or during a synthesis process to a DNA strand. 
The reverse transcription reaction using mRNA as a template synthesizes a DNA strand 
(the first-strand cDNA) toward the 5' direction of mRNA from a primer oligonucleotide 
that is annealed with the 3' end of mRNA. Thus, when the primer (oligo dT) is 
annealed with a poly(A) tail, it is easy to obtain cDNA covering the poly(A) tail. 

20 However, this method does not guarantee the synthesis of full-length cDNA possessing 
a sequence encompassing from the primer to the cap structure, because degradation of 
mRNA and/or interruption of synthesis reaction of the DNA strand frequently occur. 
In fact, most of a vast number of ESTs (expressed sequence tag) reported so far were 
derived from incomplete cDNAs generated from degraded mRNA or incomplete 

25 cDNAs generated by interruption of synthesis reaction of the DNA strand. 

Therefore, many methods have been proposed to synthesize full-length cDNA 
possessing a sequence encompassing to the cap structure that exists at the 5' end of 
mRNA. These methods are classified into the following four main cases based on the 
30 used principle. 



(1) Tailing method 



This method is based on the addition of a homo-oligomer tail using terminal 
transferase to the first-strand cDNA extended to the cap structure. The Okayama-Berg 
method (Non-patent Document 1) and the Pruitt method (Non-patent Document 2) are 
included in this category. Since it is difficult to strictly control the number of the 
5 added tail, this method has a problem that too long tailing makes nucleotide sequence 
analysis difficult. 

The template-switching method (Patent Document 1), that uses a dC tail 
added to the 3' end of the first-strand cDNA by the terminal transferase activity of 
reverse transferase, is also included in this tailing method. The number of added dC 
10 was described to be 3 to 5 in the reference (Non-patent Document 3). 

(2) Linker-ligation method 

This method comprises synthesis of the first-strand cDNA, removal of mRNA 
by alkaline or RNase H treatment, and ligation of a single-stranded oligonucleotide 
15 linker with known sequence to the 3' end of the single-stranded cDNA using T4 RNA 
ligase (Non-patent Document 4). This method is inappropriate to prepare a 
high-quality cDNA library because of formation of the secondary structure in the 
single-stranded cDNA. 

20 (3) Oligo-capping method 

This method is based on the replacement of the cap structure with an oligomer. 
The methods using an RNA oligomer (Non-patent Document 5) or a DNA-RNA 
chimeric oligomer (for example, Patent Document 1 by inventors of this application, 
Non-patent Document 6) have been reported. This method should produce only 

25 full-length cDNAs in principle, but also produces some truncated cDNAs synthesized 
from degraded mRNAs that are produced during many processes for treating mRNA, 
and besides a lot of poly(A) + RNA of about 5-10 jig is necessary. The use of total RNA 
as a starting material to suppress the degradation of mRNA has been reported to 
improve the full-length rate to be more than 90 %, but the number of reaction steps 

30 unchanged (Patent Document 3). 

This method includes the method (Patent Document 3) in which a synthetic 
oligomer was added to the cap structure after opening its carbohydrate ring by periodate 
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oxidation reaction. 

(4) Cap-trapping method 

This method is based on selecting mRNAs possessing a cap structure and 
5 using them as a template. It includes the method using mRNA selected by anti-cap 
antibody as a template (Non-patent Document 7) and the method using biotinylated 
mRNA that is prepared by adding biotin to an open ring generated by periodate 
oxidation of the carbohydrate of the cap structure and selecting by avidin-immobilized 
carrier (Non-patent Document 8). 

10 

Patent Document 1: US patent 5,962,272 
Patent Document 2: 3337748 
Patent Document 3: WO 01/04286 
Patent Document 4: US patent 6,022,715 
15 Non-patent Document 1: Okayama, H. and Berg, P. Mol. Cell. Biol. 2:161-170, 1982. 
Non-patent Document 2: Pruitt, S.C. Gene 66:121-134, 1988. 
Non-patent Document 3: CLONTECHniques, July 1997, p.26. 

Non-patent Document 4: Edwards, J., Delort, J., and Mallet, J. Nucleic Acids Res. 
19:5227-5232, 1991. 

20 Non-patent Document 5: Maruyama, K. and Sugano, S. Gene 138:171-174, 1994. 
Non-patent Document 6: Kato et al., Gene 150:243-250, 1994. 

Non-patent Document 7: Edery, I., Chu, L.L., Sonenberg, N., and Pelletier, J. Mol. Cell. 
Biol. 15:3363-3371, 1995. 

Non-patent Document 8: Carninci et al., Genomics 37:327-336, 1996. 

25 

Disclosure of Invention ; 

The forgoing any conventional method enables us to synthesize full-length cDNA. 
30 However, even if synthesized cDNAs contains full-length cDNA at high rates, they 
inevitably include incomplete cDNAs derived from degraded mRNA and/or incomplete 
cDNAs produced by interruption of cDNA synthesis. Therefore, it is necessary to 
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determine whether or not the synthesized cDNA is derived from full-length mRNA 
possessing a cap structure. In general, if multiple clones possessing the same S'-teminal 
sequence exist, it is highly possible that these clones are derived from full-length 
mRNA, but not conclusive. Especially, in the case of genes possessing multiple 
5 transcription start sites, it is very difficult to determine whether the cDNA clone is 
derived from full-length mRNA or from degraded mRNA lacking a 5' end. 

Therefore, a method for synthesizing cDNA, by which we can synthesize 
full-length cDNA at high rates and determine whether or not it possesses a sequence 
10 starting with a transcription start site, has been desired. 

Also, the foregoing any conventional method has a problem that they require 
many processes. For example, the oligo-capping method described in Patent 
Document 1 is superior with respect to its ability to certainly synthesize cDNA 
15 possessing a nucleotide sequence starting with a cap site, but it requires 8 processes to 
synthesize cDNA. The increase of processes causes problems such as the decrease of 
synthetic yield and the increases of time, labor, and cost. 

Furthermore, some conventional methods contain an amplification process by 
20 PCR (Non-patent Documents 4 and 5). Thus, these methods had a problem that the 
generated cDNA sequence had artificial mutations because DNA polymerase used for 
PCR frequently incorporated a nucleotide different from the template nucleotide during 
polymerase reaction. 

25 Therefore, a method for synthesizing full-length cDNA from a low amount of 

RNA by small processes without using PCR has been desired. 

The invention of this application has done under the foregoing circumstances, and 
makes it an object to provide a cDNA synthesis method satisfying the following 
30 requirements: 

(1) a starting material is total RNA of one to several micrograms; 

(2) no use of PCR; 
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(3) to consist of as small processes as possible; 

(4) to synthesize full-length cDNA that is guaranteed to possess a consecutive 
sequence starting with a transcription-start-site nucleotide in a high yield of more 
than 90%. 

5 

No conventional method satisfies all of these requirements. 

The first invention to solve the foregoing subject is a method for synthesizing 
cDNA possessing a consecutive sequence starting with a nucleotide adjacent to a cap 
10 structure of mRNA, which method comprises the processes of : 

(i) annealing a double-stranded DNA primer and an RNA mixture containing mRNA 
possessing a cap structure, 

(ii) preparing a conjugate of an mRNA/cDNA heteroduplex and the double-stranded 
DNA primer by synthesizing the first-strand cDNA primed with the double-stranded 

15 DNA primer using reverse transcriptase, and 

(iii) circularizing the conjugate of the mRNA/cDNA heteroduplex and the 
double-stranded DNA primer by joining the 3' and 5' ends of the DNA strand containing 
cDNA using ligase. 

20 In the method of this first invention, a preferred aspect is that mRNA possessing a 

cap structure is contained in a cell extract, or that mRNA possessing a cap structure is 
synthesized by in vitro transcription. 

Also, in the method of this first invention, a preferred aspect is that the primer 
25 sequence of the double-stranded DNA primer contains a sequence complementary to a 
partial sequence of mRNA possessing a cap structure or an oligo dT complementary to a 
poly(A) sequence of mRNA possessing a cap structure. 

Furthermore, in the method of this first invention, a preferred aspect is that the 
30 ligase is T4 RNA ligase. 

In the method of this first invention, another preferred aspect is that it comprises 
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the following process between the processes (ii) and (iii),: 

(ii') generating a 5 '-protruding end or a blunt end at the terminal of the double-stranded 
DNA primer by cutting the conjugate of the mRNA/cDNA heteroduplex and the 
double-stranded DNA primer using a restriction enzyme. 

5 

The second invention is a method for synthesizing cDNA, which comprises the 
following process in addition to the method of the foregoing first invention: 
(iv) synthesizing a second-strand cDNA by replacing an RNA strand with a DNA strand 
in the conjugate of the mRNA/cDNA heteroduplex and the double-stranded DNA 
10 primer. 

In the method of this second invention, a preferred aspect is that the 
double-stranded DNA primer contains a replication origin or both a replication origin 
and a promoter for cDNA expression. 

15 

The method of this second invention provides a clone containing the synthesized 
double-stranded cDNA. 



In the method of this second invention, another preferred aspect is to include the 
20 following process for: 

(v) incorporating the double-stranded cDNA composed of the first-strand cDNA and the 
second-strand cDNA into a vector DNA. 

This process enables us to clone the synthesized double-stranded cDNA into a vector. 

25 The third invention is a cDNA library that is a population of clones containing 

double-stranded cDNA synthesized by the method of the foregoing second invention, of 
which more than 60% of the cDNA clones possesses a 5'-end nucleotide of (dT)ndG 
(n=0-5) followed by a consecutive sequence starting with a nucleotide adjacent to a cap 
structure of mRNA. 

30 

The forth invention is a method for selecting a cDNA clone possessing a 
consecutive sequence starting with a nucleotide adjacent to a cap structure of mRNA, 
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from clones in the cDNA library of the forgoing third invention, wherein a cDNA clone 
possessing a 5' -end nucleotide of (dT)ndG (n=0-5) is selected as an objective clone. 

The fifth invention is a double-stranded DNA primer possessing an oligo (dT)n 
(n= 15-100) as a primer part, in which one terminal part of a primer side has an 8-base 
recognition restriction enzyme site RE1, and another terminal part has an 8-base 
recognition restriction enzyme site RE2 and a restriction enzyme site RE3 generating a 
5' -protruding end or a blunt end. 

A preferred aspect of the fifth invention is that a double-stranded DNA primer 
contains a replication origin or both a replication origin and a promoter for cDNA 
expression. An example of the double-stranded DNA primer of the fifth invention is a 
vector primer derived from pGCAPIO comprising the nucleotide sequence of SEQ ID 
NO: 2. 

The sixth invention is a reagent kit for cDNA synthesis, which comprises a 
double-stranded DNA primer, reverse transcriptase and its reaction buffer solution, T4 
RNA ligase and its reaction buffer solution, and model mRNA possessing a cap 
structure. 

The foregoing invention is a method for synthesizing cDNA possessing a 
consecutive sequence starting with a nucleotide adjacent to a cap structure of mRNA in 
high yield, which method comprises at least the following three processes: 

(i) annealing of a double-stranded primer and mRNA, 

(ii) preparation of a conjugate of a mRNA/cDNA heteroduplex and the 
double-stranded DNA primer by synthesizing the first-strand cDNA, 

(iii) joining the 3' and 5' ends of a DNA strand containing cDNA in the conjugate of the 
mRNA/cDNA heteroduplex and the double-stranded DNA primer. 

This invention has been completed by finding that, when mRNA possessing a cap 
structure was used as a template and the base of the cap was "G", "dC" [or 
5'-dC(dA)n-3' (n=l-5)] was added to the 3' end of the first-strand cDNA by the 



foregoing process (ii). Since "dT" [or 5'-dT(dA)n-3' (n=l-5)] was added to the 3' end 
of the first-strand cDNA when the base of the cap was "A", the added nucleotide was 
shown to have the base complementary to that of the cap structure. Also, no addition of 
an extra nucleotide to the 3' end of the first-strand cDNA was observed when RNA not 

5 possessing a cap structure was used as a template. Thus, when extra "dG" [or 
5'-dG(dT)n-3' (n=l-5)] exists at the 5' end of the cDNA, we can decide that this cDNA 
is full-length cDNA derived from mRNA possessing a cap structure. Since more than 
one extra "dG" can be added depending on reaction conditions of reverse transcriptase 
(non-patent reference), if (dN)ndG (dN is dT or dG, n=0-5) exists at the 5' end of the 

10 cDNA, generally we can decide that this cDNA is full-length cDNA derived from 
mRNA possessing a cap structure. However, when more than one extra "dG" are 
added, it is difficult to decide which "dG" is extra one. Thus, it is preferable to be 
performed under conditions for adding one extra "dG" as shown in Examples. 

15 In addition, the addition of 3-5 dCs was described in the template-switching 

method (non-patent reference 3), but we could not observe such addition of multiple 
dCs under conditions in Examples of this invention. Also, there is a report that one dC 
was preferentially added to the 3' end of the first-strand cDNA by terminal 
transferase-like activity of reverse transferase (Schmidt, W.M. and Muller, M.W., 

20 Nucleic Acids Res. 27:e31, 1999), but there is no report about its mechanism. 
Furthermore, there is a report that, when reverse transcriptase acted on an RNA/DNA 
heteroduplex (corresponding to the cap structure-free RNA/DNA heteroduplex in this 
invention), one nucleotide ("dA" or "dG" or "dC" or "dT") was added to 90 % of the 3' 
end of DNA (Chen, D. and Patton, J.T., BioTechniques 30:574-582, 2001), but such 

25 addition was seldom observed under conditions in Examples of this invention. 

In this invention, the term "nucleotide" means phosphoester (ATP, CTP, CTP, 
UTP; or dATP, dGTP, dCTP, dTTP) of nucleoside that contains a sugar linked to purine 
or pyrimidine via a beta-N-glycosidic bond. Hereafter these nucleotides can be 
30 described simply by "A", "G", "C", "U", or "dA", "dG", or "dC", "dT". The term 
"complementary" means a pairing of the nucleotides via a hydrogen bond; "A" (or 
"dA") and "U" (or "dT"), or "G" (or "dG") and "C" (or "dC"). 
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The term "double-stranded DNA primer" means double-stranded DNA in which 
one end of a DNA strand is a 3 5 -protruding end whose sequence is complementary to 
that of template mRNA. This protruding part hybridizes the template mRNA, and 
5 works as a primer for the firstrstrand cDNA synthesis by reverse transcriptase. In 
particular, double-stranded DNA with a replication origin is called a "vector primer". 

Furthermore, in this invention, the term "mRNA possessing a cap structure" 
means mRNA that is transcribed from genome DNA and whose 5' end is linked by 
10 guanosine possessing methylated guanine (G) via a 5' -5 5 triphosphate bond 
(mGp5'-5'pp). For example, in the case of a cap structure in which the seventh 
position of G is methylated, the mRNA has the following structure: 
5 , -m 7 GN 1 N 2 N 3 N4N5 N m -3':(a) 

(where N is A, G, C, or U, and m is a positive number of more than 50) 

15 

In this invention, the term "cDNA possessing a consecutive sequence starting 
with a nucleotide adjacent to a cap structure of mRNA" means all cDNAs including the 
following cDNAs. 

cDNA (the first-strand cDNA) complementary to a sequence Ni N m in the 

20 foregoing structure (a) of mRNA, wherein the 3' end is added by 5'-dC(dA)n-3' (n=0-5) 

(in more general, 5'-dC(dN)n-3' (where dN is dA or dC, n=0-5)): 

3'-(dA)ndCdN 1 dN 2 dN3dN 4 dN 5 dN m -5' :(b) 

(where dN is dA, dG, dC, or dT), 

cDNA (the second-strand cDNA) complementary to this cDNA (b): 
25 5 5 -(dT)ndGdN 1 dN 2 dN3dN 4 dN5 dN m -3' :(c), 

and a cDNA(b)/(c) duplex (double-stranded cDNA). The term "cDNA" simply means 

double-stranded cDNA, but it means the second-strand cDNA described by the 

foregoing structure (c) in the case of referring to its sequence. 

30 Since the Ni in the structure (a) of mRNA is a nucleotide corresponding to a 

transcription start site, "cDNA possessing a consecutive sequence starting with a 
nucleotide adjacent to a cap structure of mRNA" can also be defined as "cDNA 
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possessing a consecutive sequence starting with a transcription-start-site nucleotide 

Hereafter "cDNA possessing a consecutive nucleotide starting with a nucleotide 
(a transcription-start-site nucleotide) adjacent to a cap structure of mRNA " can be 

5 described as "cap-consecutive cDNA". Also, in particular, cap-consecutive cDNA 
possessing a poly(A) sequence can be described as "full-length cDNA". Furthermore, 
cDNA that does not contain a nucleotide (at least dNi in the structure (b) or (c)) 
adjacent to a cap structure can be described as "cap-nonconsecutive cDNA". Still 
furthermore, "mRNA possessing a cap structure" can be described as "cap(+)mRNA", 

10 "mRNA not possessing a cap structure" can be described as "cap(-)mRNA", and mRNA 
that is produced by removing a cap structure from cap(+)mRNA can be described as 
"decapped mRNA". 

Other terms and concepts according to this invention will be defined in detail by 
15 referring to the embodiments of the invention and Examples. In addition, various 
techniques to be used for carrying out this invention can be easily and surely carried out 
by those skilled in the art on the basis of known literatures and the like except for the 
techniques whose references are particularly specified. For example, the techniques of 
genetic engineering and molecular biology of this invention are described in Sambrook 
20 and Maniatis, in Molecular Cloning - A laboratory Manual, Cold Spring Harbor Press, 
New York, 1989; Ausubel, F.M. et al., Current Protocols in Molecular Biology, John and 
Wiley & Sons, New York, N.Y. 1995, and the like. 

25 Brief Description of Drawings 

Figure 1 is a diagram schematically showing basic processes of this invention. 

Figure 2 is a view exemplifying the general structure of a vector primer of this 
30 invention. 

Figure 3 is a view schematically showing the structures of pGCAPl and a 
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pGCAPl -derived vector primer of this invention. 

Figure 4 is a view schematically showing the structures of pGCAPIO and a 
pGCAPlO-derived vector primer of this invention. 

5 

Best Mode for Carrying Out the Invention 

The first invention is a method for synthesizing cap-consecutive cDNA (the 
10 first-strand cDNA), which method indispensably comprises the following processes (i), 
(ii), and (iii) (refer to Figure 1). 

Process (i): to anneal a double-stranded DNA primer and an RNA mixture containing 
mRNA possessing a cap structure. 

15 

Process (ii): to prepare a conjugate of an mRNA/cDNA heteroduplex and the 
double-stranded DNA primer by synthesizing the first-strand cDNA primed with the 
double-stranded DNA primer using reverse transcriptase. 

20 Process (iii): to circularize the conjugate of the mRNA/cDNA heteroduplex and the 
double-stranded DNA primer by joining the 3' and 5' ends of the DNA strand containing 
cDNA using ligase. 

In the process (i), an "RNA mixture containing mRNA possessing a cap 
25 structure" may contain either only mRNA possessing a cap structure or others such as 
"cap(-)mRNA" and/or "other RNA molecules (for example, rRNA, tRNA etc.)". Such 
an RNA mixture may be derived from either single cellular eukaryotes or multicellular 
eukaryotes. Also, this RNA mixture may be either RNA synthesized by in vitro 
transcription using DNA as a template or total RNA extracted from cells. In this 
30 process (i), although less than one jig of total RNA can be used as the RNA mixture to 
synthesize cDNA, more than one (ig of total RNA is preferentially used. Although an 
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mRNA content in total RNA extracted from cells is 2-3%, the method of this invention 
enables us to synthesize cap-consecutive cDNA or full-length cDNA from total RNA 
containing such a low amount of mRNA. 

5 A "double-stranded DNA primer" used in the process (i) has a "primer sequence" 

at its 3' -protruding end. When information on the partial sequence of target mRNA 
possessing a cap is known, the primer sequence can be prepared based on this known 
sequence using known chemical synthetic methods (for example, methods described in 
Carruthers, Cold Spring Harbor Symp. Quant. Biol. 47:411-418, 1982; Adams, J. Am. 

10 Chem. Soc. 105:661, 1983; Belousov, Nucleic Acid Res. 25:3440-3444, 1997; Frenkel, 
Free Radic. Biol. Med. 19:373-380, 1995; Blommers, Biochemistry 33:7886-7896, 
1994; Narang, Meth. Enzymol. 68:90, 1979; Brown, Meth. Enzymol. 68:109, 1979; 
Beaucage, Tetra. Lett. 22:1859, 1981; US patent 4,458,066). Most of known EST 
(expressed sequence tag) sequences were derived from a 3' -end partial sequence of 

15 cDNA. Using a primer sequence prepared based on these EST sequences, we can 
obtain cap-consecutive cDNA containing the 5 '-upstream region of the corresponding 
EST sequence. Also a primer containing an oligo dT that is complementary to poly(A) 
of mRNA can be used. The number of consecutive dT composed of the oligo dT is 
preferentially 30-70. Using these oligo dT primers, cap-consecutive cDNA 

20 encompassing to a poly(A) site (full-length cDNA) can be obtained. On the other 
hand, there is no restriction on the other terminal sequence of the double-stranded DNA 
primer so that any double-stranded DNA can be used, but preferentially it should 
possess a linking terminal for joining to the cloning site of a vector DNA to make it easy 
to insert into the vector DNA on the later process. 

25 

In the process (ii), by acting reverse transcriptase on the double-stranded DNA 
primer annealed with mRNA, a cDNA strand complementary to mRNA is synthesized 
in the 5' direction of mRNA starting from the 3' end of the double-stranded DNA primer. 
This process produces a conjugate of an mRNA/cDNA heteroduplex and a 
30 double-stranded DNA primer. In this conjugate, one end of the cDNA strand in the 
mRNA/cDNA heteroduplex joins the end of one strand in the double-stranded DNA 
primer. Furthermore, cDNA in the mRNA/cDNA heteroduplex produced by this 
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process (ii) is cap-consecutive cDNA that is complementary to cap(+)mRNA and 
possesses dC or 5'-dC(dA)n-3' at its V end, or cap-nonconsecutive cDNA that is 
derived from cap(-)mRNA. 

5 As reverse transcriptase, an enzyme that is derived from M-MLV (Moloney 

murine leukemia virus) or AMV (avian myeloblastosis virus) can be used, but 
endogenous RNaseH activity-removed one is preferable. 

As ligase in this process (iii), various kinds of DNA ligase or RNA ligase of 
10 choice can be used, but T4 RNA ligase is preferable. There is a report on a method to 
join two oligodeoxynucleotides hybridized to RNA using T4 RNA ligase (US patent 
6,368,801), but there is no report on ligation between an mRNA/cDNA heteroduplex 
and a double-stranded DNA. The ligation is preferably performed after carrying out a 
process (ii*) by which the end of the double-stranded DNA primer is converted to a 
15 5' -protruding end or a blunt end by cutting the conjugate of the mRNA/cDNA 
heteroduplex and the double-stranded DNA with restriction enzyme. The disadvantage 
by this increase of one process is compensated by merits that are decrease of 
background composed of only vector without a cDNA insert and increase of ligation 
efficiency. The cap structure of the mRNA/cDNA heteroduplex may or may not be 
20 removed, for example, using tobacco acid pyrophosphatase. Furthermore the ligation 
may be performed after degradation of mRNA in the mRNA/cDNA or replacement of 
the mRNA strand by a DNA strand. In this case, it should be noted that the 
degradation product of mRNA generated during this process can be added to the V end 
of the cDNA. 

25 

The foregoing method enables us to synthesize a circular DNA strand that 
contains cap-consecutive cDNA (the first-strand cDNA) described by the following, 

3'-(dA)ndCdN 1 dN2dN 3 dN4dN5 dN m -5' :(b) 

The obtained cap-consecutive cDNA provides information to specify a 
30 transcription start site of a gene and an expression regulation region at its upstream by 
analyzing the sequence of the cDNA and comparing it with a genome sequence. 
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The cap-consecutive cDNA (the first-strand cDNA) obtained using the foregoing 
method is converted to double-stranded cDNA by the process (iv) of the second 
invention. This process (iv) can be performed by replacing an RNA strand with a 
DNA strand, for example, by acting RNaseH, E. coli DNA polymerase I, E. coli DNA 
5 ligase, and the like. This process is not necessary to be done in vitro; for example, if a 
"vector primer" is used as a double-stranded DNA primer, the RNA strand can be 
replaced with the DNA strand in cells such as E. coli cells after the ligation product is 
introduced into the cells. 

10 The method of the second invention enables us to clone the double-stranded 

cDNA into a vector by the process (v). For example, after cutting with restriction 
enzyme sites that are set up in the double-stranded DNA, the double-stranded cDNA 
can be inserted into a plasmid vector or phage vector, and then used for sequencing 
analysis or production of its expression product. 

15 

In the method of this second invention, the use of a "vector primer" as a 
double-stranded DNA primer is preferable because the process for inserting the cDNA 
into other vector can be omitted. The vector primer can be prepared by cutting at an 
appropriate site of a circular DNA vector by restriction enzyme and then joining a 

20 3' -end protruding primer sequence that is complementary to the part of mRNA. Also, 
in order to synthesize full-length cDNA, oligo dT (preferentially 30-70 nucleotides) 
may be joined at the 3' end. Especially, the double-stranded DNA primer possessing 
the oligo dT as a 3 '-protruding end is preferable, for example, to efficiently prepare a 
full-length cDNA library. Also, in order to make it easy to recombine the cut cDNA 

25 into other vector, it is preferable to set up an 8-base recognition restriction enzyme site 
at the part of the double-stranded DNA. Furthermore, it is preferable to set up a 
replication origin at the part of the double-stranded DNA. As the replication origin, 
those that function in a prokaryotic cell such as E. coli or in eukaryotic cells such as 
yeast, insect cells, mammalian cells, plant cells and the like can be used. This enables 

30 us to replicate the obtained cDNA vector after introduction into these cells. 
Furthermore, it is preferable to set up a promoter, a splicing region, a poly(A) addition 
signal and the like at the part of the double-stranded DNA in order to express the cDNA 
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in vitro by in vitro transcription/translation or in vivo by introducing into eukaryotic 
cells. 

These double-stranded DNA primers (the fifth invention) may be properly 

5 designed using an appropriate vector DNA as a starting material, or a known vector 
primer such as a pKAl vector primer (one end possesses a 3' -protruding dT tail of 
about 60 nucleotides, and the other end is an EcoRV blunt end [Kato et al., Gene 
150:243-250, 1994]) or the like can be used. The general structure of the 
double-stranded vector primer of this invention is shown in Figure 2. This vector 

10 primer has 60+/- 10 dTs as a primer sequence. The other terminal may be either a blunt 
end or a protruding end. It is preferable that the end of a primer side contains an 
8-base recognition restriction enzyme site RE1 and that the other end has an 8-base 
recognition restriction enzyme site RE2 and a restriction enzyme site RE3 generating a 
5 '-protruding end or a blunt end. As an 8-base recognition restriction enzyme site, 

15 NotI, Sse8387I, Pad, Swal, Sfil, SgrAI, AscI, Fsel, Pmel, Srfl or the like can be used. 
A vector primer pGCAPl prepared in this invention has an Afin site (CTTAAG) as RE3. 
If the 3' end of the first-strand cDNA is joined to the 5' -protruding end of this Afin 
(...CTTAA), in the case of "cap-consecutive cDNA", "dG" is added to the 
5'-protruding end, resulting in generation of ...CTTAAG..., that is, restoration of an 

20 Afin site. Thus, the "cap-consecutive cDNA" clones can be cut with Afin. Using 
this event, cutting with Afin can be used to determine whether or not cap-consecutive 
cDNA is synthesized. In addition, Munl(CTTAAG), XZioI(CTCGAG) or the like can 
be used as a restriction site that restores its recognition site by being added with "dG". 
The pGCAPIO vector primer prepared in this invention possesses NotI as RE1, Swal as 

25 RE2, and EcoRI as RE3. 

The third invention of this application is a "cDNA library" composed of a 
population of cDNA vectors that is the final product prepared by the method of the 
foregoing second invention. As shown in Examples later, the cDNA library is 
30 characterized by containing cap-consecutive cDNA clones at extremely high rates, more 
than 60%, preferentially more than 75%, further preferentially more than 90%, most 
preferentially more than 95% . 
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Accordingly, the cDNA library of this third invention enables us to isolate and 
analyze cap-consecutive cDNA at high rates without selecting them, because most of 
clones in the library are cap-consecutive cDNAs. For purpose of accuracy, the 

5 cap-consecutive cDNA can be correctly selected by the method of the forth invention of 
this application. Since the cap-consecutive cDNA synthesized by the method of the 
foregoing first and second inventions is characterized by the presence of cc (dT)ndG" at 
the 5' end, we can identify the cap-consecutive cDNA by examining the presence of this 
"(dT)ndG" as an indicator without determining the entire nucleotide sequence of the 

10 cDNA. The presence of "(dT)ndG" can b e examined using known methods for 
nucleotide sequencing. Since more than 90 % of cap-consecutive cDNAs start with 
"dG", the presence of "dG" can be practically used as an indicator. 

The sixth invention of this application is minimum reagents that are necessary for 
15 synthesizing cDNA using the method of this invention, that is, a cDNA synthesis 
reagent kit comprising a double-stranded DNA primer, reverse transcriptase and its 
reaction buffer solution, T4 RNA ligase and its reaction buffer solution, and model RN A 
possessing a cap. By using this kit, a cDNA library containing cap-consecutive 
cDNAs can be easily prepared from a given RNA mixture containing mRNA possessing 
20 a cap. 



Examples 

25 Hereunder, the invention of this application will be explained in more detail and 

specifically by showing Examples; however, the invention of this application is not 
intended to be limited to these Examples. Incidentally, basic procedures and 
enzymatic reactions related to DNA recombination followed the literature (Sambrook 
and Maniatis, in Molecular Cloning - A Laboratory Manual, Cold Spring Harbor 

30 Laboratory Press, New York, 1989). With regard to restriction enzymes and various 
modification enzymes, the ones manufactured by Takara Shuzo Co. Ltd. were used 
unless otherwise particularly stated. Composition of a buffer solution for each 
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enzymatic reaction and reaction conditions followed the attached instruction. 

Example 1 

5 cDNA synthesis using cap analogue-attached RNA 

(1) Preparation of cap analogue-attached RNA 

A full-length cDNA clone of human elongation factor-la (EF-la), pHP00155 
10 (Non-patent Document 6), was linearized by digesting with NotI, and then this was used 
as a template to prepare mRNA using an in vitro transcription kit (Ambion). By 
adding m 7 G(5')pppG(5) or A(5')pppG(5) into the reaction solution as a cap analogue, 
model mRNA possessing "m 7 G" or "A" as a cap structure was obtained. In addition, 
by not adding the cap analogue, model mRNA without a cap structure was obtained. 
15 The 5' -terminal sequence of the in vitro transcription product is the sequence derived 
from the vector (5 * -GGG A ATTCG AGG A-3 5 ) followed by the 5'-terminal sequence of 
EF-la (5 ' -CTTTTTCGC A A ). 

(2) Synthesis of the first-strand cDNA 

20 The first-strand cDNA complementary to model mRNA was synthesized by 

mixing 0.3|ig of the forgoing model mRNA and 0.3ng of a pKAl vector primer (one 
end has a 3 '-end protruding dT tail of about 60 nucleotides and the other end is an 
EcoRV blunt end) (Non-patent Document 6) in a reaction solution (50mM Tris-HCl, 
pH8.3, 75mM KC1, 3mM MgCl 2 , 5mM DTT, 1.25mM dNTP), annealing the model 

25 mRNA and the vector, adding 200U of reverse transcriptase Superscript™ II 
(Invitrogen) and 40U of ribonuclease inhibitor (Takara Shuzo), and incubating at 42°C 
for 1 hour. After the reaction solution was extracted with phenol, a conjugate of 
cap(+)mRNA/cDNA heteroduplex and a vector primer was recovered by ethanol 
precipitation and then dissolved in 20jil of water. 

30 

(3) Decapping reaction 
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The cap structure of mRNA was removed by mixing 20yd of the 
cap(+)mRNA/cDNAheteroduplex solution in a reaction solution (50mM sodium acetate, 
pH5.5, 5mM EDTA, lOmM 2-mercaptoethanol), adding 10U of tobacco acid 
pyrophosphatase (Nippon Gene), and incubating at 37°C for 30 minutes. After the 
5 reaction solution was extracted with phenol, a conjugate of decapped mRNA/cDNA 
heteroduplex and a vector primer was recovered by ethanol precipitation and then 
dissolved in 20jj1 of water. 

(4) Self-ligation 

10 The end of the mRNA/cDNA heteroduplex and the EcoRV end of the vector 

primer were ligated and circularized (self-ligation reaction) by mixing 20jil of either the 
cap(+)mRNA/cDNA heteroduplex solution obtained in the foregoing (2) or the 
decapped mRNA/cDNA heteroduplex solution obtained in the foregoing (3) with a 
reaction solution (50mM Tris-HCl, pH7.5, 5mM MgCl 2 , lOmM 2-mercaptoethanol, 

15 0.5mM ATP, 2mM DTT), adding 120U of T4 RNA ligase (Takara Shuzo), and 
incubating at 20° C for 16 hours. After the reaction solution was extracted with phenol, 
a self-ligation product was recovered by ethanol precipitation and then dissolved in 20^1 
of water. 

20 (5) Replacement of RNA strand with DNA strand 

A vector (cDNA vector) carrying an insert of a cDNA/cDNA duplex was obtained 
as a result of synthesizing the second-strand cDNA by replacing an RNA strand with a 
DNA strand; the replacement reaction was carried out by mixing 20\xl of the 
self-ligation product with a reaction solution (20mM Tris-HCl, pH7.5, 4mM MgCl 2 , 

25 lOmM (NH4) 2 S0 4 , lOOmM KC1, 50jig/ml BSA, O.lmM dNTP), adding 0.3U of 
RNaseH (Takara Shuzo), 4U of E. coli DNA polymerase I (Takara Shuzo), and 60U of 
E. coli DNA ligase (Takara Shuzo), and incubating at 12°C for 5 hours. After the 
reaction solution was extracted with phenol, the cDNA vector was recovered by ethanol 
precipitation and then dissolved in 40^x1 of TE. 

30 

(6) Transformation of E. coli 
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Transformation was carried out using an electroporation method after mixing 
of the cDNA vector solution with DH12S competent cells (Invitrogen). The 
electroporation was carried out using MicroPulser (BioRad). The obtained 
transformants were suspended in SOC medium, seeded on agar plates containing 
5 100jig/ml ampicillin, and incubated at 37° C overnight. As a result, a library composed 
of about 10 5 -10 6 E. coli transformants was obtained. 

(7) Analysis of 5 '-end nucleotide sequence of cDNA clones 

Colonies grown on the agar plate were picked up, suspended in LB medium 

10 containing 100|ig/ml ampicillin, and incubated at 37°C overnight. After cells were 
harvested from the culture medium by centrifugation, plasmid DNA was isolated and 
purified from the cells by the alkaline/SDS method. This plasmid was used as a 
template for a cycle sequencing reaction using a kit (BigDye Terminater v3.0, ABI), and 
the 5' -end nucleotide sequence of the cDNA was determined by a fluorescent DNA 

15 sequencer (ABI). 

When model mRNA prepared using m 7 G(5')pppG(5) as a cap analogue was used 
as a template, 15 clones out of 20 clones carrying a cDNA insert contained 
cap-consecutive cDNA. With regard to these clones, extra "dG" (12 clones) or extra 
"dTdG" (1 clone) not existing in the model mRNA was added before the dG of the 

20 transcription start site. With regard to the remaining clones, 2 clones had no extra 
"dG" and 5 clones were cap-nonconsecutive cDNA starting with the middle of mRNA. 
Incidentally, decapping reaction did not influence the number of grown transformants, 
the ratio of cDNA starting with the cap site, and the addition of the extra "dG". 

On the other hand, when model mRNA prepared using A(5')pppG(5) as a cap 

25 analogue was used as a template, 18 clones out of 24 clones carrying a cDNA insert 
contained cap-consecutive cDNA. With regard to these clones, extra "dA" (15 clones) 
or extra "dTdA (1 clone) not existing in the model mRNA was added before the dG of 
the transcription start site. With regard to the remaining clones, 2 clones had no extra 
"dG" and 6 clones were cap-nonconsecutive cDNA starting with the middle of mRNA. 

30 In addition, in both cases, the addition of extra "dG" or "dA" not existing in 

model mRNA was not observed in clones possessing cap-nonconsecutive cDNA. 
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Furthermore, when model mRNA without a cap structure was used as a template, 
16 clones out of 19 clones carrying a cDNA insert contained the sequence starting with 
a transcription start site. Out of them, 14 clones did not possess an extra sequence 
before the dG of the transcription start site. However, with regard to 2 clones, extra 

5 "dT" not existing in the model mRNA was added before the dG of the transcription start 
site. With regard to the remaining clones, 3 clones were cap-nonconsecutive cDNA 
starting with the middle of mRNA. 

These results suggest that, by using the method of this invention, the first-strand 
cDNA was added by a nucleotide "dC" complementary to a base "G" of a cap structure 

10 of mRNA used as a template, and the added 3' -end "dC" of the first-strand cDNA 
results in the addition of complementary "dG" to the second-strand cDNA. 
Furthermore, sometimes the complementary "dG" was followed by dT. Therefore the 
addition of "dG" or "dTdG" to the 5' end of the cDNA indicates that the cDNA is 
cap-consecutive cDNA. 

15 

(8) cDNA synthesis using vector primer with protruding end 

When self-ligation reaction was performed after synthesizing the first-strand 
cDNA and generating the 5' -protruding end by EcoRI cut of the vector primer, a 
cap-consecutive cDNA clone possessing "dG" at the 5 5 end was obtained as well as it 
20 was done using a vector primer with blunt-end EcoRV. Therefore, the restriction 
enzyme-cut end of the pKAl vector primer can be not only a blunt end but also a 
5' -protruding end. Furthermore, the use of the 5' -protruding end improved the 
efficiency of ligation and the number of clones composed of the cDNA library as 
compared with using the blunt end. 

25 

Example 2 

Preparation of cDNA library using mRNA derived from cultured cell HT-1080 

30 Total RNA was prepared from human fibrosarcoma cell line HT-1080 (purchased 

from Dainippon Pharmaceutical Co. Ltd.) using the AGPC method (a kit from Nippon 
Gene). Poly(A) + RNA was purified by binding to a biotinylated oligo(dT) primer 
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(Promega), adding Sreptavidin MagneSphere Particles, and collecting by magnet. A 
cDNA library was prepared by synthesizing cDNA under the same conditions as 
described in Example 1 using 0.3jag of poly(A) + RNA and 0.3^ig of a pKAl vector 
primer, and carrying out transformation of E. coli. As a result, a library containing 
5 about 10 5 -10 6 transformants was obtained. The libraries were prepared with or without 
decapping reaction, but there was no significant difference between analysis results of 
the two libraries. Thus, hereafter the results obtained without decapping reaction will 
be described. 

The 5' -end nucleotide sequence of the cDNA was determined using a plasmid 

10 isolated from colonies that were randomly selected from the foregoing library. With 
regard to 191 clones carrying a cDNA insert whose sequence was determined, BLAST 
search was performed using GenBank nucleotide sequence database, showing that 189 
clones of them had been registered in the database as a gene derived from mRNA. All 
of 178 clones accounting for 94% of total clones contained a coding region. Most 

15 abundant clones were those encoding ribosomal protein PI and elongation factor 1-a 
and 5 clones each were obtained. The 5 '-end nucleotide sequence of 5 clones each 
was all 5'GCCCTTTCCTCAGCTGCCGC... for ribosomal protein PI and all 
5'-GCTTTTTCGCAACGGGTTTG. . . for elongation factor 1-a. These sequences 
except for "dG" of the 5' end were identical with those of clones (Non-patent Document 

20 6) obtained from the library prepared by the conventional method (the DNA-RNA 
chimera oligo capping method). By comparing the sequences with a genome sequence, 
it was shown that any "dG" of the 5' end did not exist in the genome sequence so that it 
was confirmed that the "dG" was added during cDNA synthesis. This was also 
confirmed by the data that 168 clones out of 178 clones containing a coding region 

25 started with "dG". Furthermore, 6 clones started with (dT)ndG (n=l-5). These 
clones may be produced by further addition of multiple "dT" to the added "dG". 

Two clones had not yet been registered in the database as a gene derived from 
mRNA, but the sequences of these clones completely agreed with a part of a genome 
sequence and some sequences in EST database. Both of the sequences had the added 

30 "dG" not existing in the genome sequence. Therefore, these 2 clones are likely to be 
novel full-length cDNAs that have not been identified as a gene. 
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Eleven clones started with the middle of mRNA (cap-nonconsecutive cDNA). 
Three clones out of these cDNA clones started with "dG" of the 5 5 end, but these "dG" 
were derived from the corresponding mRNA, and clones possessing newly added "dG" 
were not observed. 

5 From the above results, 180 clones out of 191 clones carrying a cDNA insert 

seem to be full-length (cap-consecutive cDNA) so that the full-length rate in total is 
calculated to be 94%. Since 3 clones out of 171 clones starting with "dG" of the 5' 
end were cap-nonconsecutive cDNA, in the case of this cDNA library, it can be 
guaranteed with probability of 98% that cDNA clones starting with "dG" of the 5' end 

10 and possessing a coding region are full-length cDNA. Especially, the clones starting 
with "dG" not existing in the genome sequence can be guaranteed to almost certainly be 
full-length cDNA. 

15 Example 3 

Preparation of cDNA library using total RNA derived from cultured cell HT-1080 

A cDNA library was prepared by synthesizing cDNA using 5\ig of total RNA 
prepared from human fibrosarcoma cell line HT-1080 and 0.3|ag of a vector primer 

20 under the same conditions as described in Example 1 (except for omitting decapping 
reaction), and transforming E. coli cells. As a result, a library containing about 10 5 
transformants was obtained. 

The 5 '-end partial sequences of cDNA clones in this library were analyzed as 
described in Example 2. With regard to 222 clones whose sequences could be 

25 determined, BLAST search using nucleotide sequence database of GenBank showed 
that 217 clones had been registered in the database as a gene derived from mRNA. 
Out of them, all of 209 clones accounting for 94% of total clones contained a coding 
region. Of these clones, 189 clones started with "dG". It should be noted that this 
library was prepared from total RNA but not from purified poly(A) + RNA. 

30 Furthermore, a small amount (5jag) of total RNA was used. Therefore, using this 
method, the purification process of poly(A) + RNA can be omitted and a full-length 
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cDNA library of high quality can be prepared from total RNA of several jag. 

Example 4 

5 Large-scale sequencing analysis of full-length cDNA library of cultured cell ARPE-19 

A cDNA library was prepared by synthesizing cDNA using 2.5jag of poly(A) 
+ RNA prepared from human retinal pigment epithelium cell line ARPE-19 (delivered 
from ATCC) and 0.7p.g of a vector primer under the same conditions as described in 

10 Example 1, and transforming E. coli cells. The 5'-end partial sequences of cDNA 
clones in this library were analyzed as described in Example 2. With regard to 3683 
clones whose sequences could be determined, BLAST search using nucleotide sequence 
database of GenBank showed that 3662 clones had been registered in the database as a 
gene derived from mRNA. Out of them, 3474 clones accounting for 94% of total 

15 clones were full-length cDNA clones. With regard to these clones, 3069 clones started 
with "dG" or "(dT)ndG". 

Example 5 

20 Preparation of pGCAPl vector primer 

pGCAPl was prepared using a multifunctional cloning vector pKAl (Non-patent 
Document 6) as a starting material. Figure 3A shows a view of its structure and SEQ 
ID NO: 1 in a sequence list shows its entire nucleotide sequence. The differences from 

25 pKAl are (1) changing its replication origin to pUC19-derived one, (2) adding Pad 
upstream of a restriction enzyme site HindlH in pKAl, (3) replacing an 
EcoRI-BstXI-EcoRV-Kpnl site with an EcoRI-AflH-Swal-Kpnl site. The first 
nucleotide "A" in the sequence of SEQ ID NO: 1 corresponds to a HindDI site and the 
568 th does to an EcoRI site. 

30 After lOO^ig of pGCAPl was completely digested with 200U of Kpnl, a fragment 

was isolated by 0.8% agarose gel electrophoresis. By adding 375U of terminal 
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transferase (Takara Shuzo) to 70|ig of the obtained fragment in the presence of 20|aM 
dTTP and incubating at 37°C for 30 minutes, a dT tail of about 60 nucleotides was 
added to the 3' -protruding end generated by Kpnl digestion. Then the reaction product 
was digested with Swal and the longer fragment was isolated by 0.8% agarose gel 
5 electrophoresis. This was used as a pGCAPl vector primer (Figure 3B). 



Example 6 

Preparation of cDNA library using pGCAPl vector primer 

10 

A cDNA library was prepared by synthesizing cDNA under the same conditions 
as described in Example 3 using 5[ig of total RNA of human fibrosarcoma cell line 
HT-1080 and 0.3jxg of a pGCAPl vector primer prepared in Example 5, and 
transforming E. coli cells. As a result, a library containing about 2xl0 5 transformants 

15 was obtained. When the 5' -end partial sequences of the cDNA clones in this library 
were analyzed, it was shown that full-length cDNAs possessing "dG" at its 5' end were 
obtained and a full-length rate was 95% as in Example 3. 

When a pKAl vector primer is used, addition of one G to the EcoRV-cut end 
(...GAT) results in generating ...GATG... that contains a new initiation codon ATG 

20 This is not a problem when the purpose is to know the sequence of the cap-consecutive 
cDNA, but the presence of extra ATG may have a bad effect on correct 
transcription/translation of the cDNA in the case of using this vector as an expression 
vector. By using a pGCAPl vector primer, this kind of problem does not occur 
because the addition of one G to the Swal-cut end (...ATT) results in the generation of 

25 ... ATTTG. . . that does not contain ATG 

Furthermore, when self-ligation was carried out after synthesizing the first-strand 
cDNA and generating a 5' -protruding end by cutting a vector primer with Afin, 
cap-consecutive cDNA possessing "dG" at its 5' end could be obtained as in the case of 
using a vector primer possessing blunt-end Swal. When one G was added to the 

30 Afin-cut end (. . .CTTAA), . . .CTTAAG. . . was generated, resulting in restoration of the 
Afin recognition site. As a result, cap-consecutive cDNA can be cut with Afin. 
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Therefore, AflTT digestion can be used to determine a cap-consecutive cDNA clone. 

Example 7 

5 Expression profile analysis of cultured cell ARPE-19 full-length cDNA library 

A cDNA library was prepared from 5\xg of total RNA of human retinal pigment 
epithelium cell line ARPE-19 as described in Example 4, and then the 5' -end partial 
nucleotide sequences of the cDNA clones were analyzed. With regard to 3204 clones 

10 whose sequences could be determined, BLAST search using nucleotide sequence 
database of GenBank showed that 3038 clones accounting for 95% of total clones were 
full-length cDNA clones. 

Examining the distribution of the insert size of these full-length cDNA clones 
showed that the clones contained inserts of a wide range of sizes from 0. 1 kbp of the 

15 shortest one to 10 kbp of the longest one and that an average length was 1.94 kbp. The 
long-sized clones carrying a more than 3-kbp insert accounted for 16% of total clones. 

These clones were classified into 1408 kinds of genes. Most abundant clone 
was glyceraldehydes-3-phosphate dehydrogenase cDNA whose content was 44 clones 
(1.4% of total clones). Only 235 kinds of cDNAs showed an expression level of more 

20 than 0.1%, that is, more than 2 clones each were obtained. On the other hand, 971 
kinds of cDNAs (69% of total genes) were genes whose expression level was less than 
0.03%, because only one clone each was obtained. Furthermore, some clones seem to 
be a novel gene clone of very low expression level, whose sequence agrees with the 
genome sequence but has not yet been registered in the database. As shown above, the 

25 obtained cDNA library was confirmed to be a low-redundant high-quality library 
containing a large number of genes of low expression level. 

The results of above analysis suggest that the cDNA library prepared by the 
method of this invention contains full-length cDNA clones at high rates and truly 
reflects the expression level of mRNA expressed in cells because of no bias of gene 

30 length or its expression level. Therefore, this method is effective not only to obtain 
full-length cDNA clones but also to analyze the expression profile of genes expressed in 
the cells. 



27 



Example 8 

Preparation of pGCAPIO vector primer and preparation of cDNA library using this 

5 

pGCAPIO was prepared using pGCAPl as a starting material. Figure 4A shows 
a view of the structure, and SEQ ID NO: 2 in the sequence list shows its entire 
nucleotide sequence. The difference from pGCAPl is a 

Swal-EcoKL-Fsel-EcoKV-Kpnl site produced by replacing an EcoRl-AflU-Swal-Kpnl 

10 site. As described in Example 5, a dT tail of about 60 nucleotides was added to the 
3'-protruding end of Kpnl in pGCAPIO. After digesting the reaction product with 
EcoRV, a longer fragment was used as a pGCAPIO vector primer (Figure 4B). Using 
0.3jig of this vector primer and 5|ig of total RNA of human fibrosarcoma cell line 
HT-1080, the first-strand cDNA was synthesized under the same conditions as described 

15 in Example 3, and then the end of the vector side was converted to a S'-protruding end 
by EcoRI digestion. After self-ligation, the ligation product was used for 
transformation of E. coli cells by omitting the process replacing an mRNA strand with a 
DNA strand. As a result, a cDNA library containing about 10 6 transformants was 
obtained. With regard to the cDNA clones in this library, analysis of the 5' -end partial 

20 sequence was carried out, showing that full-length cDNA added by "dG" at the 5' end 
was obtained and a full-length rate was 95% as in Example 3. 

Industrial Applicability 

25 

As described in detail above, by the invention of this application, full-length 
cDNAs that are guaranteed to possess a consecutive sequence starting with the 
nucleotide of a transcription start site can be synthesized from total RNA of one to 
several jig at high yield of more than 90% by small processes not using PCR. As a 
30 result, information on the primary structure encoded by the gene and information on the 
expression regulatory region controlling the expression of the gene can be robustly 
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obtained so that this invention greatly contributes to not only the effective use of 
genome information but also the production of recombinant protein useful in medical 
fields and the like. 



